Why R?
- Open source
- World wide community
- Free
- Used by organizations like Google, New York Times, Financial Times
Let’s make sure we understand installing R, and calling libraries.
Why R? (2)
library(plotly) # this is a helpful comment
p <- plot_ly(GSS2014, # the data I am using
x = ~coninc, # income in CONSTANT $
y = ~depress, # depression
color = ~health, # color by poorhealth
z = ~health, # poor health
type = "scatter3d", # 3D scatterplot
mode = "markers")
What are we doing???
- Start R
- Get some data in it
- A few descriptive statistics
- Some graphs
Using General Social Survey (GSS) for Example
General Social Survey: Nationally representative sample collected annually or biannually from 1972 to 2010.
When downloading data, download data from CANVAS to Mac or Windows desktop, then start RStudio to open with R.
Data Are Just Rows and Columns
We use both the codebook and data set.

Scripting (R Syntax)

Get Data (Script)
# local file
# make sure you are in the right directory
# Menu: Session | Set Working Directory
load("GSS2014.Rdata")
Menu option as well
- Loading your data is sometimes the hardest part.
- Load is for data that is ALREADY IN R FORMAT!
- Pay attention to WHERE your data and scripts live.
- Note that R uses forward slashes: /
The R Interface

Measures of Central Tendency
- What are the mean, median, and mode? Why are they different?
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 369.5 17551.2 33255.0 48603.3 60967.5 160742.2 224
library(psych) # to load psych
describe(GSS2014$coninc)
## vars n mean sd median trimmed mad min max range
## X1 1 2314 48603.29 43340.89 33255 40902.37 28760.59 369.5 160742.2 160372.7
## skew kurtosis se
## X1 1.42 1.34 900.98
We End With a Graph
hist(GSS2014$coninc,
col = "blue") # histogram of the income variable

And Another
pie(table(GSS2014$sex),
labels=c("male", "female"),
col = c("blue", "gold"))

Questions?